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Abstract Body 


Background / Context 

Single-case designs (SCDs) are a class of research methods for evaluating intervention 
effects by taking repeated measurements of an outcome over time on a single case, both before 
and after the deliberate introduction of a treatment. SCDs are used heavily in fields such as 
special education, school psychology, social work, and applied behavior analysis (Busse, 
Kratochwill, & Elliott, 1995; Horner et al., 2005; Kazdin, 2011; Kennedy, 2004; Odom et al., 
2005), frequently in combination with behavioral observations. Indeed, the focus on outcome 
measurements based on direct observation is considered a hallmark of single-case methodology, 
in that treatment impacts on behavioral outcomes often have immediate and recognizable social 
implications for individual participants and the broader populations that they represent 
(Hartmann & Wood, 1990; Homer et al., 2005). Given the prominence of behavioral observation 
data in single-case research, primary investigators and meta-analysts need effect size measures 
that are appropriate and interpretable when applied to such data. 

Several different operational procedures are commonly used to record direct observations 
of human behavior, ranging from continuous duration recording to interval recording methods 
(Altmann, 1974; Barlow & Hersen, 1984; Kazdin, 2011). I describe these in more detail below. 
The variations in recording procedures have important implications for meta-analysis. In a 
collection of studies, some studies may have used one recording procedure while others used 
another. In order for a meta-analysis of such a heterogeneous collection of studies to be 
scientifically interpretable, the results of each study must be expressed on a common scale. If the 
basic input units into the meta-analysis — effect sizes — are not comparable across recording 
procedures (or what I term “operationally comparable”), average across and contrasts between 
results based on different recording procedures will be confounded by differences of scale. 

Purpose / Objective / Research Question / Focus of Study 

This methodological research will describe a model for behavioral observation data that 
allows definition of an intuitively interpretable, operationally comparable effect size, the 
prevalence odds ratio (POR). After defining the POR, I describe basic estimators based on data 
from several different recording methods. 

Significance / Novelty of study 

Many different effect sizes have been proposed for meta-analysis of single-case studies, 
but nearly all are subject to serious criticisms (Allison & Gorman, 1993; Beretvas & Chung, 
2008; Shadish, Rindskopf, & Hedges, 2008; Wolery, Busick, Reichow, & Barton, 2010). Current 
proposals for single-case effect sizes can be classified into three broad categories: parametric 
models for single cases (Busk & Serlin, 1992; Center, Skiba, & Casey, 1985; Swaminathan et al., 
2008), hierarchical models for groups of cases (Hedges, Pustejovsky, & Shadish, 2012; Van den 
Noortgate & Onghena, 2003a, 2003b, 2007, 2008), or non-overlap statistics (Parker, Vannest, & 
Davis, 201 1). Both types of parametric approaches have focused largely on standardized mean 
differences, which are appropriate for continuous, interval scale data but less useful when 
measurements are discrete or have bounded ranges. Non-overlap statistics, many of which are 
inspired by non-parametric test statistics, have been criticized for being un-interpretable as 
measures of effect magnitude, as well as for lacking known sampling distributions and for being 
sensitive to design features (such as number of repeated measurements in a phase) that are not of 
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scientific interest (Beretvas & Chung, 2008; Shadish & Rindskopf, 2007; Wolery et al., 2010). 
Crucially, none of the current proposals for single-case study effect sizes are designed to address 
the implications of different recording procedures. 

The properties of the recording procedures have long been subject to scrutiny and debate. 
Much of the debate has centered on the theoretical interpretation and practical utility of interval 
recording methods (Altmann, 1974; Harrop, Daniels, & Foulkes, 1990; Mann, Ten Have, 
Plunkett, Meisels, & Have, 1991). The sensitivity of results to variation in recording methods has 
been studied through simulations (e.g., Rapp, Colby-dirksen, Michalski, Carroll, & Lindenberg, 
2008) and through empirical examples (e.g., Rapp et al., 2007), but rarely through explicit 
statistical modeling. The most relevant exception is Rogosa and Ghandour (1991), who used an 
alternating renewal process model (like the one described below) to study the psychometrics of 
behavioral observations; however, these authors focus mostly on behavior frequency measures, 
rather than prevalence. Other authors have used an alternating poisson process formulation to 
study the properties of momentary time sampling (Brown, Solomon, & Stephens, 1977; Griffin 
& Adams, 1983). 

Statistical, Measurement, or Econometric Model: 

In order to define an operationally comparable effect size, I posit a model for the 
sequence of behavioral events that occur over a single observation session; this sequence of 
events in time is sometimes called the behavior stream (Hartmann & Wood, 1990; Rogosa & 
Ghandour, 1991). Based on the behavior stream, I describe the properties of several different 
recording methods. I then define the prevalence odds ratio (POR) and, based on a simple 
between-session model, consider how to estimate the POR from data generated by different 
observation recording procedures. 

Behavior stream data. During a single observation session, the behavior stream can be 
described as follows. Assume that, within session t, events occur sequentially and can be 
numbered u = 1,2,3,. . .. Let D tu denote the duration of event u; let E tu denote the length of time 
between the end of event u and the beginning of event u+ 1, sometimes called the inter-event 
time (IET); let E t{) denote the length of time until the first event, with E l{) if event 1 is occurring at 
the beginning of the observation period. The quantities { Eto,D t \,E t \ ,D t 2 ,E t 2 ,D t 3 ,Ea } are the 
underlying data that describe the behavior stream during session t. Figure 1 depicts the behavior 
stream. 

Recorded data. I now describe several different recording procedures, denoting the 
recorded datum from session t and recording method m as Y "' , me{C,M,E,P,W] . Assume 
that the session is of length T. 

• In continuous duration recording, the recorded datum Y t c measures the proportion of session 
time during which the behavior occurs. 

• In momentary time sampling, the observer records the presence or absence of a behavior at 
each of K time-points during a session (typically, time-points are equally spaced). The 
reported datum Y t M measures the proportion of time-points at which the behavior was 
occurring (see Figure 2a). 

• In event counting, the recorded datum Y t E measures the number of times that an event begins 
during the course of the session. 
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• In partial-interval recording, the observer divides the session into K intervals, each of length 
L. The recorded datum Y t E measures the proportion of intervals during which the behavior 
occurred for any length of time (see Figure 2b). 

• Whole-interval recording is structured identically to partial-interval recording, except that 
each interval is scored only if the behavior occurs for the entire interval (see Figure 2c). The 
recorded datum Y' v therefore measures the proportion of intervals in which the behavior 
occurred for the duration. 

Within -session model. I will assume that, within each session, the behavior stream can 
be modeled by an equilibrium alternating renewal process (EARP). The EARP and is a broad 
classes of models for describing a behavior that is either present or absent, and has been used to 
study the psychometric properties of behavioral observation data (Rogosa & Ghandour, 1991). In 
an equilibrium alternating renewal process, it is assumed that IETs are identically distributed 
random quantities, that event durations are also identically distributed, and that all IETs and 
event durations are mutually independent. For session t of length T, the main parameters of the 
model are then the average event duration ju t = E (D n ) and the average IET \ = E (E n ) . Table 1 

reports the expectations of each type of recorded data, under the assumptions of the EARP. 
Derivations are omitted due to space constraints. 

Target effect size. If one has to choose a single parameter to describe session-to-session 
changes in the behavior stream, the best candidate may be the prevalence odds ratio, which 
measures proportional change in the ratio of /u t to A t . For comparing session a to session b, 
define: 


Q = 


\ 


Ma'K 

There are several reasons for making the POR the target of inference. First, the POR compares 
the prevalence across sessions, and from the point of view of an interventionist, prevalence is 
presumably the most substantively important aspect of behavior. This is because reducing the 
incidence of an undesirable behavior without changing its prevalence (i.e., fewer incidents of 
longer duration) is not a clear improvement. Second, if the average event duration /u t is constant 
from session to session, the POR has the simple interpretation of a proportionate change in inter- 
event rates. For example, if /u a = /u b , then a POR of Q = 1 / 2 means that from session a to 


( 1 ) 


session b the average inter-event rate has halved and, equivalently, that the average IET has 
doubled. Similarly, if the average inter-event time is constant across sessions, the POR represents 
a proportionate change in the event duration. The POR therefore provides an intuitive means of 
equating reductions in event duration with increases in IET. This final property is particularly 
desirable in a meta-analysis context in which one may wish to compare some interventions that 
reduce the duration of an undesirable behavior with others that reduce the frequency of the 
behavior. For purposes of meta-analytic modeling, it is often helpful to use scales that that have 
no upper or lower limit. Because, the POR ranges from 0 to positive infinity, it is therefore useful 
to use the log of the prevalence odds ratio rather than the ratio itself. Thus, the target of 
estimation and inference is co = log(Q) = log ju b - log \ - log pi a + log A a . 

Between-session model. I consider a very basic model for a set of sessions. Suppose that 
the first no observation sessions occur in a baseline phase, which is immediately followed by a 
treatment phase consisting of n\ observation sessions. Further assume that within a phase, 
observation sessions are independent and identically distributed (i.i.d.), so that 
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Admittedly, the assumption that repeated measurements are i.i.d. is probably unrealistically 
strong, in that it does not allow for between-session trends or serial dependence among repeated 
measurements. I maintain it here in order to illustrate the relationship between recording 
procedures and estimators. 

I now describe some very simple moment estimators of the log-POR comparing the 
baseline with the treatment phase. The estimators are summarized in Table 2, along with 
approximate delta-method variance estimators. Due to space constraints, I do not comment on all 
recording methods. To begin, note that continuous duration recording and momentary time 
sampling provide direct estimates of prevalence, so that co can be estimated by taking the log- 
odds ratio of the phase means. Event counting estimates incidence rather than prevalence; to 
estimate prevalence, it may sometimes be reasonable to assume that the average duration is 
known and constant across phases: p a = p b = // . Under this assumption, the POR is equivalent to 

the ratio of X a / X h , which can be estimated as described in Table 2. Because in partial interval 
recording, the expectation of the recorded datum depends on the full distribution of the IETs, 
further parametric assumptions are needed. Assuming that E n ~ Exp( \ / A t ) , it follows that 

E (Y t p ) = 1 - A, t e~ LIXl / ( jU'+A t ) . Further assuming a known value for ju a = jU h = // leads to a 

moment estimator for log (A a / \ ) . 

Usefulness / Applicability of Method: 

A recent meta-analysis of single-case studies (Shogren, Faggella-luby, Bae, & 

Wehmeyer, 2004) evaluated the effects of allowing children greater autonomy to make choices. 
Of the 13 studies in the original meta-analysis, 9 studies (including 27 participants) used one or 
more of the recording methods described above to measure participants’ levels of problem 
behavior or task dis-engagement (see Table 3). Table 4 details the observation recording 
procedures used for each case, as well as the assumed value of ju for event count and partial 
interval data. Figure 3 presents a forest plot of the log-prevalence odds ratios for each case, along 
with the estimated overall average effect size based on a random effects meta-analysis. In this 
preliminary analysis, the average effect of allowing choice-making is estimated to be -1.51 with 
a 95% confidence interval of [-2.01, -1.02]. This overall average effect corresponds to a 
reduction of between 64% and 87% in the prevalence odds from the no-choice baseline 
condition. The between-case variance in the true effects is estimated to be 1.43 [Q(26) = 378, p < 
.0001, 1 2 = 94%]. 

Conclusions 

The model described here provides a basis for defining an operationally comparable 
effect size, the prevalence odds ratio, that allows comparisons and meta-analytic summaries of 
treatment effects measured using different behavioral observation recording procedures. The 
model highlights the strong assumptions needed to estimate the effect size based on event 
counting or partial interval recording data. Future work will evaluate the sampling distribution of 
the moment estimators proposed here, as well as considering alternative estimators such as those 
based on making second-moment or full distribution assumptions regarding the event durations 
and inter-event times. I will also consider three-level meta-analytic models to account for study- 
level effects. 
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Table 1 

Expectations of recorded datum produced by various recording methods 


Recording method 

Expectation 

Source 

Continuous duration recording 

n 

II 

5: 

jt* 

Cox (1962, p. 101; see also Rogosa 
& Ghandour, 1991, p. 226) 

Momentary time sampling 

E lY t M )= H 

y ’ aH 

Cox (1962, p. 87) 

Event counting 

e(f ( £ ) = t 
1 ’ mH 

Cox (1962, p. 46) 

Partial interval recording 

E{Y r\-*< P ^ dV 
y,) »<+A, 

Author derivation, based on Cox 
(1962, p. 85) 

Whole interval recording 

, . u-\F n (v)dv 

zr( v w \ _ ^ J o ' 

y ‘ > ’ 

Author derivation, based on Cox 
(1962, p. 85) 


Notes: 

T denotes the total length of the session. 

F £ (v) = Pr(F 1 > v) is the complement of the cumulative distribution function of the IETs . 

F D (v) = Pr(D 1 > v) is the complement of the cumulative distribution function of the event durations. 
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Table 2 

Effect size estimators and variance estimates 


Recording method 

Estimate of POR 

Variance estimate 

Continuous duration 
recording 1 

d) c = logit ( y, c ) - logit ( y 0 c ) 

Var(cb c )« f cl 2 + S }° 

«i(>i c ) (i-yf) « 0 () ; (f) (i-y 0 c ) 

Momentary time 
sampling 1 

a> M = logit logit ( y 0 M ) 

2 2 

Vnr(m Smi I Smo 

" ) 2 (i-y,”) 2 ^(y„“) 2 (i-Vo") 2 

Event counting 2 

( T ^ ( T ^ 

K = l°g -l°g 

Uo J U J 

TV t 2 v 2 

Vnr( m \ ~ E1 1 £0 

n\p{y?) - Ty f) n\e{y") -TJf ) 

Partial interval 
recording 3 

(Op =iogi 0 -iogi 1 

V ' V «/ £1 

^ Zj , P \2 ( - -\ 2 

i=0 «1 ( 1 - >’| ) M+C// + LX) 


Notes: 


1 «o ] "i 

For recording method me{C,M ,E,P,W\ , the within-phase means are y™ = — TK" and y” = — ; 

O 0 ,=l ,=1 


the within-phase sample variances are s 2 m 


1 «0 , 

-iZr 

«o-! *=i 


1 logit(p) = log(p) - log(l - p) . 

2 Assuming a known value for // = // = // • 




3 Assuming a known value for // = // = // and that E n ~ Exp ( 1 / ) . The estimator is defined implicitly as the solution to 

yf = 1 - Ae~ LU /(jU + A), and must be evaluated numerically except in certain special cases. 
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Table 3 

Studies in meta-analysis by Shogren et al. (2004) 


Study 

Citation 

Design 

Number of participants by recording method 

Continuous 

duration 

recording 

Momentary 
time sampling 

Event counting 

Partial interval 
recording 

Dun 1994 

(Dunlap et al., 1994) 

ABAB 




3 

Dye 1990 

(Dyer, Dunlap, & 
Winterling, 1990) 

ABAB 




3 

Fre2001 

(Frea, Arnold, & 
Vittimberga, 2001) 

Multiple baseline 



1 


Jol2001 

(Jolivette, Wehby, 
Canale, & Massey, 2001) 

Multiple baseline 




3 

Ker2001 

(Kern, Mantegna, 
Yorndran, Bailin, & Hilt, 
2001) 

ABAB 



1 

2 

Moel998 

(Moes, 1998) 

AB/BA 




4 

Powl997 

(Powell & Nelson, 1997) 

ABAB 


1 



Rom2002 

(Romaniuk et al., 2002) 

ABAB 

5 


1 


Seyl996 

(Seybert, Dunlap, & 
Ferro, 1996) 

ABAB 




3 
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Table 4 

Cases in meta-analysis by Shogren et al. (2004) 


Study 

Case 

Recording method 

Session length 
(minutes) 

Interval length 
(seconds) 

Assumed fi 
(seconds) 

Dun 1994 

Ahmad 

Partial interval recording 

15 

10 

10 

Sven 

Partial interval recording 

15 

10 

10 

Wendall 

Partial interval recording 

15 

15 

10 

Dye 1990 

George 

Partial interval recording 

15 

30 

10 

Lori 

Partial interval recording 

15 

30 

10 

Mary 

Partial interval recording 

15 

30 

10 

Fre2001 

Tim 

Event counting 

10 


0.0001 

Jol2001 

Bruce 

Partial interval recording 

15 

10 

10 

John 

Partial interval recording 

15 

10 

10 

Nicky 

Partial interval recording 

15 

10 

10 

Ker2001 

Danny 

Event counting 

15 


0.0001 

Kelly 

Partial interval recording 

Not reported 

10 

10 

Shannon 

Partial interval recording 

30 

10 

10 

Moel998 

Carl 

Partial interval recording 

20 

10 

10 

Charles 

Partial interval recording 

20 

10 

10 

Chuck 

Partial interval recording 

20 

10 

10 

James 

Partial interval recording 

20 

10 

10 

Powl997 

Evan 

Momentary time sampling 

30 



Rom2002 

Brooke 

Continuous duration recording 

5 



Christy 

Continuous duration recording 

5 



Gary 

Continuous duration recording 

5 



Maggie 

Continuous duration recording 

5 



Rick 

Continuous duration recording 

5 



Riley 

Event counting 

5 


0.0001 

Seyl996 

Bob 

Partial interval recording 

Not reported 

10 

10 

Maria 

Partial interval recording 

Not reported 

10 

10 

Scott 

Partial interval recording 

Not reported 

10 

10 
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Figure 1 

The behavior stream 
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Figure 2 

Observation recording methods 


(a) Momentary time sampling 
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Forest plot of log-prevalence odds ratio effect sizes 
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